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The ability of legume crops to fix atmospheric nitrogen via a symbiotic association with soil 
rhizobia makes them an essential component of many agricultural systems. Initiation of 
this symbiosis requires protein phosphorylation-mediated signaling in response to rhizobial 
signals named Nod factors. Medicago truncatula (Medicago) is the model system for study- 
ing legume biology, making the study of its phosphoproteome essential. Here, we describe 
the Medicago PhosphoProtein Database (MPPD; http://phospho.medicago.wisc.edu), a 
repository built to house phosphoprotein, phosphopeptide, and phosphosite data spe- 
cific to Medicago. Currently, the MPPD holds 3,457 unique phosphopeptides that contain 
3,404 non-redundant sites of phosphorylation on 829 proteins. Through the web-based 
interface, users are allowed to browse identified proteins or search for proteins of inter- 
est. Furthermore, we allow users to conduct BLAST searches of the database using both 
peptide sequences and phosphorylation motifs as queries. The data contained within the 
database are available for download to be investigated at the user's discretion. The MPPD 
will be updated continually with novel phosphoprotein and phosphopeptide identifica- 
tions, with the intent of constructing an unparalleled compendium of large-scale Medicago 
phosphorylation data. 
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INTRODUCTION 

Legumes, also known as Fabaceae, are a very large and economi- 
cally important group of plants (Graham and Vance, 2003). They 
are used as food crops, forages, and green manure throughout 
the world. Most legumes can develop a nitrogen-fixing associa- 
tion with soil bacteria known as rhizobia, which results in the 
formation of root nodules (Jones etal, 2007). Rhizobia thrive 
inside these nodules and fix atmospheric nitrogen in exchange 
for a carbon source (Venkateshwaran and Ane, 2011). Many 
legumes also establish symbiotic associations with arbuscular 
mycorrhizal fungi which facilitate the acquisition of nutrients 
(phosphorous, nitrogen, etc.) and provide some protection against 
environmental stresses (Ruiz-Lozano etal., 1995; Schutzendubel 
and Polle, 2002; Bonfante and Genre, 2010). Medicago truncat- 
ula (Medicago) is a well-established model for studying legume 
biology and, in particular, the molecular mechanisms mediat- 
ing symbiotic associations (Cook, 1999). The Medicago research 
community has developed many genetic, proteomic, and genomic 
tools, including the recent release of its genome sequence 
(Thoquet etal, 2002; Ane etal, 2008; Colditz and Braun, 2010; 
Young etal, 2011). 

Genetic studies in Medicago have allowed a precise dissec- 
tion of the molecular mechanisms controlling the early steps 
of the rhizobia-legume symbiosis (Riely etal, 2004, 2006). In 
response to plant signals, rhizobia produce Nod factors, which 



are lipochitooligosaccharides required for both root colonization 
and nodule organogenesis (Denarie etal., 1996; Brelles-Marino 
and Ane, 2008). Nod factors are recognized by LysM-receptor 
kinases on the plasma membrane, including Nod Factor Percep- 
tion (NFP) and LYK3 (Ben Amor etal., 2003; Arrighi etal, 2006; 
Smit etal, 2007). Does not Make Infections 2 (DMI2) is another 
receptor-like kinase residing on the plasma membrane but with 
leucine-rich repeats (LRR) and plays a role downstream of NFP 
(Endre etal, 2002; Limpens etal, 2005). The signals are trans- 
duced from the plasma membrane to the nucleus, where they 
activate oscillations of calcium concentrations (calcium spiking) 
and a calcium/calmodulin-dependent protein kinase (CCaMK) 
named DMI3 (Ehrhardt etal, 1996; Levy etal, 2004; Peiter 
etal., 2007; Shimoda etal., 2012). Proteins interacting with and 
phosphorylated by protein kinases, such as LYK3, DMI2, and 
DMI3, have been identified by targeted studies and mediate 
symbiotic signaling (Kevei etal., 2007; Messinese etal., 2007; 
Lefebvre etal, 2010; Mbengue etal, 2010; Horvath etal, 2011; 
Chen etal., 2012). Hence, protein phosphorylation plays a cen- 
tral role in symbiotic signaling, but this is also true for many 
cellular signaling events mediating developmental processes and 
responses to external stimuli (Laugesen etal, 2006; Peck, 2006; 
Huber,2007). 

Except for few studies (Laugesen etal., 2006; Lima etal., 
2006; Wienkoop etal., 2008), the Medicago phosphoproteome 
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has not been studied extensively in vivo until we published a 
large-scale phosphoproteomic study in 2010 (Grimsrud etal., 
2010). This work identified phosphorylation sites on pro- 
teins isolated from Medicago roots (from whole cell lysate and 
membrane- enriched fractions) using immobilized metal affin- 
ity chromatography (IMAC) and tandem mass spectrometry. 
The data collected from this large-scale study were used to 
create an online Medicago PhosphoProtein Database (MPPD; 
http://www.phospho.medicago.wisc.edu). The salient features of 
MPPD are discussed below. 

MASS SPECTROMETRY ANALYSIS OF THE MEDICAGO 
PH0SPH0PR0TE0ME 

The MPPD was populated using the phosphoproteomic work- 
flow presented in Figure 1. Proteins were isolated from Medicago 
plant tissue and digested with trypsin, Lys-C, Glu-C, Arg-C, or 
Asp-N. To reduce sample complexity, resulting peptides were then 
fractionated by strong cation exchange chromatography. Phos- 
phopeptides were then enriched by IMAC and analyzed using 



an electron transfer dissociation (ETD) -enabled LTQ Orbitrap 
mass spectrometer (Thermo-Fisher). To increase proteome cov- 
erage, both collisionally activated dissociation (CAD) and ETD 
(Syka etal., 2004) were used for peptide fragmentation. Spec- 
tra were searched against a Medicago protein database using the 
Open Mass Spectrometry Search Algorithm (OMSSA; Geer et al, 
2004). Identifications were then filtered to 1% FDR at both the 
peptide and protein level. This analysis produced 3,457 unique 
phosphopeptides, 829 unique proteins, and 3,404 non-redundant 
sites of phosphorylation (Grimsrud etal., 2010). The entirety of 
this data is contained within the MPPD and is freely available for 
download. 

OVERVIEW OF THE MEDICAGO PHOSPHOPROTEIN 
DATABASE 

The MPPD (http://phospho.medicago.wisc.edu) is a web-based 
resource that allows users to search for a particular protein of inter- 
est, BLAST (Basic Local Alignment Search Tool; Altschul etal, 
1990) a protein sequence, browse the entire phosphoproteomic 




Enzymatic Digestion SCX Fractionation 




h 










Si. 




— j 



Mas s Spectrometry n HPLC ESI 

4^ f~ 




Phospho Enrichment 




H, O 



T andem MS (ETD or CAD) 
/ 



Phosphopeptide Results 



r 




Peptides (unique) 


3,457 


Proteins (unique) 


829 



Sites (non-redundant) 3,404 



FIGURE 1 | Proteomic workflow used to populate the Medicago 
Phosphoproteome Database. Proteins were isolated from Medicago 
root tissue and digested with trypsin, Lys-C, Glu-C, Arg-C, or Asp-N 
to create peptides. These peptides were then fractionated by strong 
cation exchange (SCX), enriched for phosphopeptides by immobilized 



metal affinity chromatography (IMAC), and sampled using nHPLC- 
MS/MS. Peptides were fragmented via electron transfer dissociation 
(ETD) or collisionally activated dissociation (CAD). The results 
of this workflow are displayed in the table labeled phosphopeptide 
results. Modified from Grimsrud etal. (2010). 



Frontiers in Plant Science | Plant Proteomics 



June 2012 | Volume 3 | Article 122 | 2 



Rose etal. 



Medicago PhosphoProtein Database 




B Search Medicago PhosphoProtein Database 



U search 
H Browse 
B Download 

U Medkago Group 
J BioInfoRx, Inc. 



C Search Results 



5R9SELRR 
SRsSELRR 



LiMS Laboratory Inlomiatio 

Medicago PhosphoProtein Database 

Database Description: 

Medicago truncatula is a major model legume developed to study symbiotic associations such as legume nodulation 
and arbuscular mycorrhizae and also a wide variety of biological topics from plant development and pathology to 
evolution and ecology. We have recently utilized new tandem mass spectrometry technology to map 3,404 non- 
redundant sites of phosphorylation on proteins rrom '''e&cec:- roots (Grimsrud et al. 2010 '. Analysis of this data 
has revealed novel plant protein phosphorylation motifs and previously unreported phosphorylation sites on 
proteins involved in symbiotic signaling as well as many proteins with central roles in plant biology. This online 
database was created to provide the research community with an easy way to access legume phosphoproteomic 
data generated from this and future studies. When citing The Medicago Phosphoprotein Database, please 
reference: 

Grlmsrua, P. A., den Os. D., Wflnaar, C. D., Swarisy, D. L, Schwartz. D , S jssman. Id. R., Ane, J. M., and Coon, J. J. 
(2010) Large-scale phosphoprotein analysis in Medicago truncatula roots provides insight into in 
vivo kinase activity in legumes. Plant Physiology, 152, 19-28 



Search Terms: 

IPD3 



* Description fe.g. ABD32373. kinase , binding) 

BLAST Protein Sequence (E-value threshold: 1 !■*] ) 
■ Amino Acid Sequence / Phosphorylation Motif (e.g. 5sL<T , case sensitive) 
Tip: Use x (single) cr * (multiple) as wild can/ to match amino acids. 
' " rs Indicate sites of phosphorylation. 



Note: You can enter multiple tt 
Search 



Coverage 4.6200 
Redundancy 0.0800 
sequence: \ 
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FIGURE 2 | Homepage and search page of the Medicago PhosphoProtein 
Database. (A) The MPPD homepage contains a description of the database 
as well as instructions on how to search the database. (B) Search result for 
the protein. Interacting Protein of DM13 (IPD3). A protein description search 
returned results listing the protein description, protein coverage, sequence 
redundancy, number of protein groups, number of proteins and peptides 
within a protein group, and the protein group p-score. Modified amino acids 
are displayed in lowercase with green and red letters representing localized 



and unlocalized phosphosites, respectively. (C) Selecting amino acid s161 
displays information related to the mass spectrometric identification of the 
peptide used to identify the given phosphosite. This page displays the e-value 
and p-value assigned by OMSSA, the precursor charge state, precursor 
theoretical mass, precursor experimental mass, precursor mass error, 
number of matching fragments, number of possible fragments, whether all 
phosphosites were localized, the dissociation method, the phosphopeptide 
sequence, and the reference in which the phosphosite was described. 



database, and download the mass spectrometric data derived from 
a phosphoproteomic analysis of Medicago. The MPPD home page 
displays a description of the database and provides a reference 
to the publication from which the data was derived (Figure 2A). 
The homepage also provides detailed instructions for searching 
the database as well as an explanation of the various options 
for handling the data (i.e., search, browse, download). Here, we 
discuss each of these features and provide a guide to navigating 
the MPPD. 

SEARCHING THE MEDICAGO PHOSPHOPROTEIN DATABASE 

The MPPD allows users to search the database using three differ- 
ent criteria: (1) protein description, (2) BLAST protein sequence, 
and (3) peptide sequence/phosphorylation motif. For all search 
types multiple queries can be executed at one time. The search 
for protein description allows users to query the database using 
gene accession number, protein name, or any word contained 
in the protein description. Figure 2B displays the search results 
resulting from a search for Interacting Protein of DMI3 (IPD3). 
IPD3 interacts with and is phosphorylated by DMI3 and is a 
key regulator of legume-rhizobia symbiotic signaling (Messinese 
etal., 2007; Horvath etal., 2011). MPPD serves as a repository 
which contains key information about the phosphorylation sites 
of several proteins involved in various developmental processes 
in Medicago. Search results for each query will display the pro- 
tein description, protein coverage, and sequence redundancy (i.e., 
the percent of the protein sequenced by multiple peptides). When 
analyzing shotgun proteomic data, short peptides may match to 
many proteins; this has been dubbed as the protein inference 
problem (Nesvizhskii and Aebersold, 2005). To ensure accurate 



reporting of the number of protein identifications, proteins with 
shared peptides are grouped together in protein groups. The light 
blue bar in Figure 2A contains information regarding the pro- 
tein group for IPD3. In this case, there was one protein group 
that contains two separate protein entries from the Medicago 
database. This group contained six peptides and received a p- 
score of 2.94e-33. Briefly, the p-score is calculated by multiplying 
the p-values of all peptides contained within the protein group 
and is used to conduct false discovery rate analysis at the protein 
level. In addition, the complete protein sequence is displayed with 
identified sites of phosphorylation appearing as lower case letters 
colored green or red with the colors representing localized and 
unlocalized sites of phosphorylation, respectively. Amino acids 
highlighted in green (localized sites of phosphorylation) indicate 
that the mass spectrometric data provided clear evidence for phos- 
phorylation of the given residue. Amino acids highlighted in red 
indicate that mass spectrometric analysis had determined the pres- 
ence of phosphorylation, but the site of phosphorylation could 
not be assigned to a specific residue with high confidence. In the 
case of unlocalized phosphorylation sites, all possible sites are 
highlighted in red. 

Selecting an amino acid, such as s 1 6 1 , will display the character- 
istics of the spectra used to identify the given phosphorylation site 
(Figure 2C). This information includes the e-value and p-value 
assigned by OMSSA, precursor charge state, precursor theoreti- 
cal neutral mass, precursor experimental neutral mass, precursor 
mass error, number of matching fragments, number of total 
fragments, localization state of all phosphosites, the MS/MS dis- 
sociation method, phosphopeptide sequence, and reference to the 
paper which describes this identification. The e-value provides a 
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rough characterization of the spectral quality as a lower e-value 
indicates the given MS/MS spectrum is likely associated with the 
phosphopeptide in question. The precursor mass error represents 
the deviation of the experimental mass from the theoretical mass. 
High accuracy mass analyzers (e.g., Orbitrap, FT-ICR, and TOF) 
enable greater specificity when searching protein databases and 
generally a precursor mass error less than 10 ppm will increase 
the number of peptide identifications (McAlister etal., 2010). 
The dissociation method is important to note, as certain pep- 
tides are more amenable to either collisional or electron-based 
fragmentation techniques (Swaney etal., 2008). To repeat the 
identification of phosphosites contained within the MPPD, it is 
important that the same fragmentation conditions be used. All 
possible phosphopeptide sequences are also displayed on this page. 
If the phosphosite was localized, only one sequence will appear 
per line with the modified residue appearing in lowercase. In 
the case of unlocalized phosphosites, all possible phosphopep- 
tides will be displayed with each possible phosphosite appearing in 
lower case. 

Assessing sequence homology across species enables researchers 
to infer function of unknown proteins and make connections 
to research published on similar proteins in different systems. 
The MPPD allows users to conduct a BLAST search to query 
homology of proteins within the Medicago protein database with 
a user-provided protein sequence. This search returns all of 
the same protein attributes discussed above, but it adds a link 
to a flat file containing all alignments above the user-provided 
threshold. This flat file contains a list of the proteins matched, 
e-value score of each match, and a visual representation of the 
alignment. 

Phosphorylation motifs can help to elucidate potential kinases 
which alter the phosphorylation state of a particular amino acid 
(Schwartz and Gygi, 2005). The MPPD allows users to query the 
database by entering a short amino acid sequence or phospho- 
rylation motif (e.g., SsLxT). Here, "x" is used as a single wild 
card amino acid while "*" is used for multiple wild card amino 
acids. This search is also case-sensitive, as lowercase letters signify 
a phosphorylated residue. The result of this search is the same for 
a protein description search (Figure 2B), but the results contain 
proteins that have the specified amino acid sequence or phospho- 
rylation motif. This feature enables users who are interested in a 
particular kinase to determine if this protein kinase contains the 
motif of interest and if a phosphosite at this location was identified 
in our mass spectrometry analysis. 

BROWSING AND DOWNLOADING DATA 

The MPPD allows users to browse all protein groups contained 
within the database by selecting the "Browse" option in the tools 
menu. As described above, protein grouping occurs when multiple 
entries within the protein database share an identified peptide. To 
explain peptide identifications with the fewest number of proteins, 
these protein identifications are placed into one group. When the 
user selects the browse task, an online table appears, listing each 
identified protein group, including the associated protein number, 
number of proteins in the group, number of peptide identifica- 
tions in the protein groups, the p-score for each protein, and the 
protein description of the longest protein in the protein group. 



As discussed above, the p-score is calculated by multiplying the 
OMSSA p-value for each peptide in the group and is used to cal- 
culate the false discovery rate for the given protein. Users can 
browse protein entries by changing pages, but to access a pro- 
tein entry, users must copy the protein description and use the 
search tool. 

To download text files containing all protein group, protein, and 
peptide identification data, users can select the download option 
on the tools menu. To download data right click on "Group Infor- 
mation," "Protein Information," or "Peptide Information" and 
select "Save As." Headers associated with each file are listed on 
the download page, as the text files do not contain headers. 

CONCLUSION 

As large-scale phosphoproteomic analysis of plant tissues con- 
tinues to become more prevalent, tools are needed to enable 
facile access to data (Jayaraman et al., 2012). Numerous databases 
for proteomic information in plants exist, including databases 
focused on sub-cellular fractions (AMPP, Kruft etal., 2001; 
AraPerox, Reumann etal, 2004; AtNoPDB, Brown etal, 2005; 
SUBA, Heazlewood etal., 2007; PIProt, Kleffmann etal, 2006; 
AT_Chloro, Ferro etal., 2010), single species (SpruceDB, Lip- 
pert etal, 2009; Soybean Proteome Database, Sakata etal, 2009; 
PhosPhAt, Durek et al., 2010), multiple species (P3DB, Gao et al, 
2009; PPDB, Sun etal, 2009), 2-D gel mapping (GelMap, Rode 
etal., 2011), and spectral data from mass spectrometry experi- 
ments (ProMex, Hummel et al., 2007) . Here we highlight a publicly 
available web portal for Medicago phosphoproteins and phospho- 
peptides. This online database enables researchers to search for 
proteins of interest, BLAST for homologous proteins, search for 
phosphorylation motifs, browse, and download the data. This 
central repository contains all of the information for researchers 
to connect phosphosites identified in Medicago to other legumes, 
providing an invaluable resource for future studies pertaining to 
legume biology and, particularly, to the legume-rhizobia sym- 
biosis. In addition to the data presented here, the Wisconsin 
Medicago Group (University of Wisconsin, Madison) has been 
very active in continuing phosphoproteomic characterization of 
Medicago in response to symbiotic signals. In particular, we are 
pursuing quantitative measurements of phosphorylation dynam- 
ics within Medicago in response to various symbiotic stimuli. As 
we report our results, we will continue to build database tools and 
enable researchers to connect their studies with our results. These 
tools will include the ability to query quantitative information of 
phosphorylation state alterations, allowing researchers to deter- 
mine if proteins of interest are involved in a symbiotic cascade. 
In addition, future database applications will offer the opportu- 
nity for other researchers to upload their own data, creating a 
large, centralized source for all phosphoproteomic data relating 
to legumes. 
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